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I.  Introduction 

A  task  is  a  set  of  related  operations  which  can  be  performed  on  some  input  data. 
An  algorithm  is  a  collection  of  tasks  which,  once  performed,  will  have  accomplished  a 
well  defined  goal.  In  the  context  of  computing,  for  example,  a  task  may  be  a  function,  a 
subroutine,  or  a  process;  an  algorithm  is  a  complete  computer  program.  In  the  context  of 
a  manufacturing  plant,  a  task  may  be  milling  operations  of  a  part,  or  assembly  operations 
for  a  sub-assembly  of  a  product;  an  algorithm  is  the  complete  production  of  a  product.  As 
the  examples  indicate,  the  tasks  constituting  an  algorithm  are  such  that  they  may  require 
communication  of  results  or  synchronization  among  them.  This  imposes  a  partial  order 
relationship  in  the  execution  of  the  tasks  for  a  request  to  execute  the  algorithm.  We  shall 
use  the  term  algorithm  to  refer  to  a  collection  of  tasks  with  an  underlying  atmcture.  A 
major  aspect  of  this  structure  is  the  precedence  relationships  (i.e.,  the  partial  order  of 
execution)  among  the  tasks.  For  a  given  request  to  execute  an  algorithm,  all  the  tasks,  as 
specified  by  the  algorithm  and  its  structure,  are  performed  on  some  given  input  data. 

We  examine  two  models  of  processing  systems  which  perform  an  algorithm  for  a  stream 
of  execution  requests.  In  the  parallel  execution  model,  several  processors  are  provided 
which  run  asynchronously.  Each  processor  is  dedicated  to  perform  specific  tasks.  The 
system  is  such  that  different  requests  to  execute  the  algorithm  may  be  serviced  at  the 
same  time  at  different  processors  in  the  system.  This  property  is  referred  to  as  pipelined 
processing  of  requests.  The  system  also  allows  many  processors  to  simultaneously  execute 
their  tasks  for  a  given  execution  request,  as  long  as  the  restrictions  imposed  by  the  algo¬ 
rithm’s  task  structure  are  satisfied.  This  property  is  referred  to  as  concurrent  processing  of 
tasks.  A  processing  system  which  follows  the  above  model  for  servicing  execution  requests 
is  referred  to  as  a  distributed  processing  system. 

In  the  sequential  execution  model,  a  single  processor  is  provided  which  can  perform  all 
tasks.  In  this  case,  the  processor  can  service  only  one  request  at  a  time.  When  servicing 
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a  request,  the  processor  executes  the  tasks  sequentially.  We  refer  to  such  a  system  as  a 

centraliied  processing  system. 

In  this  report,  we  compare  the  performance  of  a  distributed  processing  system  with  that 
of  a  centralized  processing  system,  performing  an  algorithm  with  a  given  task  structure 
for  a  stream  of  execution  requests,  in  terms  of  the  average  execution  time  for  a  request. 
Average  execution  time  includes  waiting  time  at  the  processors  along  with  the  actual  time 
for  the  processors  to  perform  the  tasks.  The  performance  comparison  is  done  with  the 
constraint  that  the  total  capacity  of  computing  resources  is  the  same  for  both  processing 
systems.  This  constraint  can  be  seen  as  an  attempt  to  compare  the  performance  of  two 
systems  with  the  same  cost,  where  cost  is  considered  to  be  proportional  to  the  capacity 
of  a  processor.  In  this  comparison,  it  is  assumed  that  communications  among  processors 
are  instantaneous  and  for  free.  There  exist  other  important  criteria  by  which  to  compare 
distributed  and  centralized  processing  systems,  such  as  reliability,  ease  of  system  design, 
maintainability,  flexibility  in  adapting  to  other  algorithms,  etc.  Such  criteria  are  not 
considered  in  this  report. 

Similar  comparative  studies  have  appeared  in  the  literature  for  multi-server  queues  [1- 
4],  multiple  resource  systems  [4j,  and  a  restricted  type  of  a  distributed  system  [5].  Morse  [l] 
was  the  first  to  consider  an  optimization  problem  for  a  multi-server  queue.  He  considered 
an  M/M/m  queue  with  First-Come-First-Serve  (FCFS)  service  discipline  where  each  server 
services  a  job  at  rate  C/m  operations/sec.  He  found  that,  if  the  average  waiting  time  is 
to  be  minimized,  then  m  should  be  as  high  as  possible;  but  if  the  average  system  delay  is 
to  be  minimized,  then  m  should  be  equal  to  one.  Stidham  [2]  extended  the  above  result 
to  G/M/m,  G/D/m,  and  QjEklm  queues.  Brumelle  [3]  considered  a  GJGfm  queue  and 
showed  that,  as  long  as  the  service  time  distribution  has  a  coeflUcient  of  variation  that  is 
less  than  or  equal  to  one,  the  average  system  delay  is  minimized  when  m  is  equal  to  one. 
Kleinrock  [4],  in  addition  to  surveying  these  results,  considered  a  system  where  each  server 
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has  its  own  queue  of  jobs.  Assuming  Poisson  arrivals  of  jobs  and  exponential  distributions 
for  the  service  times,  he  shows  that  a  single  queue  with  a  single  server  of  the  combined 
capacity  is  better  than  the  multiple  queue  system. 

The  distributed  system  considered  here  is  more  general  than  the  systems  mentioned 
above  in  the  sense  that  each  request  for  the  execution  of  the  algorithm,  will  in  general 
require  service  from  more  than  one  processor.  There  has  been  a  recent  paper  by  Kleinrock 
[5j  which  considers  such  a  distributed  system.  The  system  analyzed  in  [5]  is  somewhat 
restricted,  in  the  sense  that  only  algorithms  which  consist  of  a  series  of  tasks  are  considered. 
The  model  of  precedence  relationships  developed  by  us  is  more  general  and  contains  the 
sequential  precedence  relationship  used  in  [5]. 

For  the  remainder  of  the  report  we  shall  proceed  as  follows.  In  Section  2,  we  define 
various  types  of  simple  structures  for  algorithms,  a  structural  graph,  and  the  execution 
model  of  an  algorithm.  In  Section  3,  we  define  distributed  and  centralized  processing 
systems  and  their  respective  models.  In  Section  4,  we  compare  a  distributed  processing 
system  to  a  centralized  system  for  various  types  of  service  distributions,  service  disciplines, 
and  various  types  of  structures  of  algorithms.  The  results  obtained  are  summarized  as 
three  propositions.  In  addition,  we  consider  examples  which  indicate  which  of  the  possible 
generalizations  of  these  propositions  are  not  valid.  In  Section  5,  we  summarize  the  results 
obtained  and  discuss  further  work  needed  to  use  the  analytical  model  developed  in  this 
study  for  the  design  of  distributed  systems. 


II.  Algorithms  and  Structural  Graphs 

As  defined  in  the  introduction,  a  task  consists  of  a  set  of  operations  which  can  be 
performed  on  some  input  data;  an  algorithm  is  a  collection  of  tasks  with  an  underlying 
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structure.  A  major  aspect  in  the  definition  of  the  structure  of  an  algorithm  is  the  prece¬ 
dence  relationships  (i.e.,  the  partial  order  of  execution)  that  exist  among  its  tasks. 

2.1  Types  of  Precedence  Relationships 

We  define  five  basic  types  of  precedence  relationships.  In  the  following  definitions,  the 
symbols  Fa  and  Fj,  are  used  to  indicate  two  distinct  tasks,  and  {F,}  and  {Fy}  are  used  to 
indicate  two  distinct  sets  of  tasks  which  may  have  some  common  members. 

(a)  Sequential  Relationship:  A  Sequential  relationship  S:Fa  —*  F5  specifies  that  for 

any  request  to  execute  these  two  tasks,  task  Fa  must  be  completed  before  task  F^  may 
begin.  Following  the  completion  of  F^,  task  F^  is  executed. 

(b)  If-then-else  Relationship:  An  If-then-else  relationship  IF:  Fa  —*  {F,  }  specifies 

that  for  any  request  to  ex«^cute  these  tasks,  task  Fa  must  be  completed  before  any  task 
in  {F,}  may  begin.  Following  the  completion  of  Fa,  one  and  only  one  task  in  {F,}  is 
selected  for  execution  according  to  some  selection  procedure. 

(c)  Merge  Relationship:  A  Merge  relationship  Af:{F,}  — >  Ff,  specifies  that  for  any 

request  to  execute  these  tasks,  any  task  in  {F,}  must  be  completed  before  task  Fj  may 
begin.  It  is  assumed  here  that  for  any  such  execution  request,  only  one  task  in  {F,}  would 
be  actually  executed.  Following  the  completion  of  this  execution,  task  F^  is  executed. 

(d)  Fork  Relationship:  A  Fork  relationship  F:  Fa  -+  {Fy}  specifies  that  for  any  re¬ 

quest  to  execute  these  tasks,  task  F*  must  be  completed  before  tasks  in  {F,}  may  begin. 
Following  the  completion  of  Fa,  all  tasks  in  are  executed. 

(e)  Join  Relationship:  A  Join  relationship  J:  {F,}  — ►  F^  specifies  that  for  any  request 

to  execute  these  tasks,  all  tasks  in  {F,  }  must  be  completed  before  task  Ft  may  begin.  It  is 
assumed  here  that  for  any  such  execution  request,  all  tasks  in  {F,}  would  be  executed.  The 
order  in  which  the  tasks  of  {F,}  are  completed  is  not  relevant.  Following  the  completion 
of  tasks  {F,},  task  Ft  is  executed 
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A  graphical  notation  used  to  represent  the  various  types  of  precedence  relationships 
is  given  in  Figure  1.  In  this  notation,  nodes  represent  tasks,  and  the  directed  arcs  among 
nodes  (along  with  the  special  symbols)  represent  the  order  of  execution  of  the  tasks. 

2.2  Structural  Graphs 

The  structure  of  an  algorithm  is  represented  by  a  graph  based  on  the  graphical  notation 
defined  above.  Such  a  graph  is  called  the  structural  graph  of  the  algorithm. 

Using  the  relationships  defined  above  as  primitives,  one  can  build  some  general  and 
complex  structural  graphs.  However,  though  not  readily  apparent  from  their  definitions, 
these  primitives  may  be  combined  to  lead  to  some  “ill-behaved”  structures.  Figure  2  shows 
two  generic  examples  of  such  structures.  Figure  2  (a)  is  an  example  of  a  structure  in  which 
a  task  will  be  executed  an  infinite  number  of  times  for  a  single  request  to  execute  the 
algorithm.  Figure  2  (b)  is  an  example  of  a  structure  where  the  conditions  necessary  to 
execute  a  task  are  never  met,  thus  leading  to  a  deadlock.  In  this  report,  we  consider 
algorithms  that  are  well-behaved;  i.e.,  do  not  have  deadlocks  or  infinite  execution  times. 

Algorithms  in  which  a  task  is  never  executed  more  than  once  for  a  given  external 
execution  request  are  known  as  acyclic  algorithms.  Examples  of  acyclic  and  non-acyclic 
algorithms  are  shown  in  Figure  3.  In  this  report,  we  restrict  algorithms  to  be  acyclic. 
Our  analysis  of  acyclic  algorithms  can  be  extended  to  include  non-acyclic  algorithms,  but 
only  if  the  number  of  times  a  task  is  executed  for  a  request  to  execute  the  algorithm  is 
deterministic  and  independent  of  the  input  data. 

Based  on  the  types  of  relationships  used,  structural  graphs  are  divided  into  three  non¬ 
overlapping  classes:  SIFM,  PERT,  and  General.  Any  structure  which  has  only  Sequential, 
IF-then-else,  or  Merge  types  of  relationships  belongs  to  the  SIFM  class.  Any  structure 
which  has  only  Sequential,  Fork,  or  Join  types  of  relationships  belongs  to  the  PERT  class. 
(This  class  is  named  PERT  because  the  structures  in  it  resemble  PERT  Activity  Networks 
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a)  Sequential 


b)  If-  ihen -else 


c)  Fork 


d)  Merge 


e)  Join 


Figure  1.  Types  of  precedence  relationships,  a)  Sequential,  b)  If-then-else, 
c)  Fork,  d)  Merge,  e)  Join. 


b) 


Figure  2.  “Ill-behaved”  Structural  graphs,  a)  Infinite  execution,  b)  Deadlocks, 


Figure  3.  a)  An  acyclic  algorithm,  b)  A  non-cyclic  algorithm. 
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[9].)  Any  structure  that  does  not  lit  in  these  two  classes  belongs  to  the  General  class. 
Figure  4  shows  an  example  of  each  class  of  structure. 

Structural  graphs  can  also  be  characterized  according  to  the  type  of  topology.  Three 
types  of  topologies  can  be  identified:  purely  serial  (PS),  purely  parallel  (PP),  and  general 
(G).  Figure  5  shows  simple  examples  of  these  three  types  of  topologies. 

For  brevity,  the  type  of  a  structural  graph  may  be  referred  to  by  an  ordered  list  of 
its  attributes;  namely,  type  of  precedence  relationships,  type  of  topology,  and  number  of 
tasks.  For  instance,  the  structural  graph  shown  in  Figure  5  (b)  is  a  SIFM/PP/4  structure 
of  an  algorithm. 

The  model  of  an  algorithm  and  its  structure  defined  here  can  be  extended  in  many 
ways.  For  instance,  one  may  allow  the  completion  of  some  task  to  force  another  ongoing 
task  to  complete  its  execution,  or  one  may  allow  a  task  to  begin  its  execution  but  wait  for 
other  tasks  to  complete  their  executions  before  its  own  execution  is  completed.  We  have 
not  considered  such  a  general  model  because  the  analysis  presented  in  this  report  is  not 
adaptable  to  systems  with  these  characteristics. 

2.S  Execution  of  an  Algorithm 

Consider  an  algorithm  A  consisting  of  tasks  =  1, 2, ... ,  m}  and  structural  graph 
G.  Requests  to  execute  the  algorithm  arrive  from  some  external  source  according  to  some 
general  arrival  process  A. 

In  an  SIFM  type  algorithm,  an  external  request  to  execute  the  algorithm  may  be 
directed  to  any  task  in  the  graph;  i.e.,  the  request  may  be  for  the  execution  of  the  algorithm 
starting  at  any  particular  task.  Given  the  nature  of  the  SIFM  type  algorithm,  no  deadlocks 
will  occur,  and  completion  of  the  algorithm  for  each  input  is  well-defined. 

In  an  arbitrary  PERT  structure,  the  above  does  not  hold.  The  arrival  of  an  external 
request  to  a  particular  task  in  the  graph  may  lead  to  a  dead-lock,  and  thus  to  an  ill-defined 
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Figure  4.  Classes  of  structures  a)  SIFM,  b)  PERT,  c)  General. 


a)  SIFM/PS/4 


b)  SIFM/PP/4 


c)  SIFM/G/4 


Figure  S.  Types  of  topologies  of  structures,  a)  Purely  Serial  (PS),  b)  Purely  Parallel 
(PP),  c)  General  (G). 


notion  of  algorithm  execution.  An  example  of  such  a  situation  is  depicted  in  Figure  6, 
where  the  arrival  of  an  external  request  to  begin  execution  at  task  Fi  cannot  be  serviced. 
An  external  request  must  be  such  that  both  tasks  Fi  and  F2  are  executed  (not  necessarily 
simultaneously).  Thus  for  PERT  structures,  we  shall  assume  that  there  exists  a  single  task 
from  which  all  tasks  in  the  structure  can  be  accessed.  (Note  that  in  a  PERT  structure, 
the  execution  of  a  task  results  in  the  execution  of  all  tasks  which  are  downstream  of  it.)  If 
there  is  no  such  no  node  in  the  graph,  then  one  can  add  a  “phantom*  node  Fq  (consisting 
of  no  operations)  to  the  structure  with  the  appropriate  Fork  relationship.  (See  example  in 
Figure  7.)  Note  that  the  phantom  node  Fq  can  also  be  added  even  when  the  single  starting 
node,  say  Fi,  exists,  simply  by  using  the  sequential  relationship  5;Fo  — ►  Fi. 

Without  loss  of  generality,  we  shall  also  consider  that  each  SIFM  structure  contains 
such  a  phantom  node  Fq,  and  an  If-then-else  relationship  IFiFq  —*  {F,-,i  =  l,2,...,m}. 
(See  Figure  8.)  The  selection  of  the  task  F,,  1  <  »  <  m,  to  follow  Fo  is  deduced  from  the 
arrival  process  A.  The  same  comments  hold  for  general  structures.  In  general  one  may 
have  to  introduce  more  than  a  single  phantom  node.  An  example  is  depicted  in  Figure  9. 
Therefore  we  shall  assume  that  all  external  requests  from  process  A  are  directed  to  node 
Fo. 

The  execntion  time  for  an  external  request  is  the  period  of  the  time  between  the  instant 
of  its  arrival  to  the  system  and  the  instant  at  which  all  tasks  associated  with  that  request 
have  completed  execution,  starting  with  Fq. 
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III. 


Models  of  Distributed  and  Centralized  Processing  Sys 
terns 


3.1  Model  of  a  Distribnted  Proeeasing  Syatezn 

Consider  an  algoritam  consisting  of  a  collection  of  m  tasks  A 

processing  system  of  m  processors  =  1,2, is  provided  such  that  processor 
P«i  1  <  »  <  can  perform  only  task  P,-.  We  also  consider  the  existence  of  a  virtual  pro¬ 
cessor  Pq  performing  task  Fq.  The  processors  run  asynchronously,  but  can  communicate 
with  each  other  over  a  common  channel.  Each  processor  is  modelled  as  a  single  server 
with  infinite  waiting  room.  Once  a  processor  has  begun  execution  of  a  task,  it  will  run 
independently  and  will  complete  this  instance  of  task  execution  without  any  further  syn¬ 
chronization  among  processors.  The  processor  queues  any  other  requests  received  during 
its  execution  of  the  current  task,  and  acts  upon  them  after  completing  the  execution  of 
this  task.  Processor  F,  performs  operations  at  the  rate  of  C,-  operations  per  second.  We 
assume  that  the  server  does  not  go  idle  when  requests  for  execution  of  the  task  are  await¬ 
ing,  nor  does  a  request  leave  the  processor  before  receiving  full  service.  Different  service 
disciplines  may  be  considered  at  the  various  processors.  The  service  disciplines  considered 
in  this  report  are  First-Come-First-Serve  (FCFS),  Last-Come-First-Serve  with  Preemptive 
Resume  (LCFS-PR),  and  Processor-Sharing  (PS). 

Consider  a  stream  of  external  requests  to  execute  the  algorithm,  which  arrive  according 
to  arrival  process  A.  We  number  sequentially  these  requests.  Let  Ei,i  =  1,2,...,  denote 
the  tth  external  request.  (Recall  that,  without  loss  of  generality,  we  have  restricted  external 
requests  to  arrive  to  node  Fq.)  In  servicing  an  external  request  E{,  requests  to  perform 
specific  tasks  in  the  algorithm  are  generated.  Define  a  job  to  be  a  request  to  execute  a  task. 
Jobs  can  be  uniquely  identified  by  the  notation  where  i  represents  the  external  request, 
and  j  the  particular  task,  to  which  this  job  corresponds.  Note  that  jobs  >1}  are 
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created  externally  according  to  the  arrival  of  external  execution  requests,  and  there  exists 
one  such  J,_o  for  each  external  request  Ei.  All  other  jobs  Jij,j  ^  0,  are  created  internally 
as  a  result  of  the  completion  of  some  other  jobs  already  in  the  system,  according  to  the 
structure  of  the  algorithm.  Depending  on  the  types  of  relationships  used  in  the  graph,  not 
all  tasks  defined  in  the  graph  are  executed  for  each  execution  request.  Hence  for  a  given 
£■,  ,  not  all  jobs  J,  y,  1  <  ;  <  wi,  are  created.  We  define  the  execution  time  for  request  Ei, 
as  the  time  it  takes  for  all  jobs  J,,,  which  have  been  created  to  be  completed,  following 
the  creation  of  Ji^. 

We  now  make  some  simple  observations.  In  an  SIFM  type  graph,  one  and  only  one  job 
Ji  j  for  some  ;,  1  <  j  <  m,  exists  for  a  given  request  Ei  at  any  one  time  during  the  execution 
time  of  Ei.  Hence  the  execution  time  of  Ei  is  the  time  it  takes  for  a  specific  sequence  of  jobs 
J,- 0,  Jijif .  •  • ,  Ji,jn )  •  •  •  to  be  executed.  Note  that  this  model  allows  pipelined  processing  of 
successive  execution  requests. 

In  a  PERT  type  graph,  all  tasks  J,,y,  1  <  j  <  m,  must  have  been  created  sometime 
during  the  execution  time  of  request  Ei.  It  is  also  possible  at  any  one  time  to  find  more 
than  one  job  J.-^y,  1  <  j  <  m,  in  the  system.  Note  that  this  model  not  only  allows  pipelined 
processing  of  successive  execution  requests,  but  also  allows  concurrent  processing  of  tasks 
to  be  executed  for  an  execution  request. 

We  now  define  the  arrival  process  A.  We  assume  that  external  arrivals  of  requests  (to 
node  Fq)  are  according  to  an  aggregate  Poisson  process,  rate  A  requests/second.  In  the 
SIFM  type  graphs,  we  also  consider  that  following  Fo,  task  F,-,  1  <  i  <  m  is  selected  with 
probability  9,-.  We  furthermore  assume  that  the  selection  of  the  next  task  in  an  If-then-else 
relationship  is  random  according  to  a  fixed  distribution,  independent  of  previous  selections. 

The  number  of  operations  associated  with  a  particular  job  J.^y  is  a  random  variable 

Z,y  which  is  continuous  over  the  positive  real  line*.  Furthermore,  we  assume  that  Z,y  is 

*Tbis  assumption  of  Z,j  being  a  continuous  random  variable  is  reasonable  if  we  assume  that  the  discrete 
time  slot  needed  to  perform  each  operation  is  quite  small  compared  to  the  total  service  time  per  task. 
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independent  from  all  other  random  variables  Zy,  A:  #  t,  1  <  /  <  m,  and  that,  for  a  given  j 
all  the  random  variables  Z,y, t  =  1,2,...  that  are  created  are  independent  and  identically 
distributed,  with  distribution  Bj(x).  The  mean  number  of  operations  performed  for  job 
Ji  j  is  denoted  by  {1/fij).  The  mean  number  of  operations  performed  for  a  request  is 
denoted  by  This  parameter  is  independent  of  the  system  or  its  model,  and  is  given 

in  terms  of  fij,j  =  1,2,...,  m,  according  to  the  structure  of  the  algorithm  and  the  arrival 
process. 

3.2  Models  of  a  Centralised  System 

A  single  processor  is  provided  which  can  perform  all  tasks.  The  processor  is  modelled 
as  a  single  server  of  capacity  C  operations  per  second  with  infinite  waiting  room  for  jobs. 
On  receiving  a  request  to  execute  the  algorithm,  the  processor  will  execute  in  sequence  all 
tasks  that  are  to  be  performed  for  this  request.  While  servicing  a  request,  the  processor 
will  queue  all  other  requests  received. 

We  consider  two  alternative  models  for  the  service  time  in  a  centralized  system:  a 
single-stage  model  and  a  multi-stage  model.  In  the  single-stage  model,  we  view  the  service 
time  as  a  single-stage  of  service  with  a  distribution  which  is  of  the  same  type  as  that 
of  individual  tasks,  (assuming  all  task  service  distributions  Bj{x),j  =  l,2,...,m,  in  the 
distributed  system  to  be  of  the  same  type,  e.g.  Exponential,  Uniform,  Hyper-exponential, 
etc.),  and  which  has  a  mean  service  time  denoted  by  {l/fiC),  where  C  is  the  capacity 
of  the  processor,  and  fi  is  the  average  number  of  operations  that  will  be  executed  for 
a  request.  This  is  the  model  used  by  Kleinrock  [5]  with  exponential  everything.  By 
considering  only  the  average  number  of  operations  performed  for  an  execution  request  in 
the  distributed  system,  this  model  ignores  the  higher  moments  of  the  service  time  (which 
are  determined  either  by  the  particular  choice  of  the  distribution,  as  is  the  case  with 
the  exponential  distribution,  or  arbitrarily).  We  consider  the  single-stage  model  to  be  a 
first-order  approximation. 
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In  the  multi-stage  model,  the  service  is  viewed  as  a  series-parallel  arrangement  of 
stages,  where  each  stage  corresponds  to  a  service  time  of  a  task.  Here  different  stages  may 
have  different  distributions.  For  any  PERT  type  graph,  the  service  time  is  modelled  as 
a  single  series  of  stages,  and  the  order  of  stages  in  the  model  is  selected  arbitrarily  but 
so  as  to  preserve  the  partial  order  of  execution  of  the  tasks,  as  specified  by  the  structure 
of  the  algorithm.  For  any  SIFM  type  graph,  the  service  time  is  modelled  as  a  number  of 
parallel  branches  of  series  of  stages,  where  each  branch  corresponds  to  a  distinct  sequence 
of  jobs  to  be  executed  for  a  request.  The  total  number  of  branches  is  determined  by  the 
structural  graph  and  by  the  arrival  process  A.  We  consider  this  model  to  be  accurate  in 
the  sense  that  it  takes  into  account  the  higher  moments  of  the  service  time  distributions 
of  individual  tasks. 

The  definition  of  execution  time  in  a  centralized  system  is  clear  since  the  processor 
services  one  request  at  a  time.  The  performance  comparison  between  the  distributed 
system  and  the  centralized  system  is  based  on  the  average  execution  time  of  a  request 
with  both  systems  having  the  identical  arrival  process.  Figure  10  shows  the  model  of  the 
distributed  processing  system  and  the  models  of  the  centralized  processing  system  of  a 
SIFM/G/4  algorithm. 

S.S  Notation 

As  seen  from  the  previous  section,  a  processing  system  has  many  attributes;  these  are: 

a)  the  type  of  system  model  (distributed  [d],  or  centralized  single-stage  [cs|,  or  centralized 
multi-stage  [cm]), 

b)  the  type  of  precedence  relationships  (SIFM,  PERT,  G), 

c)  the  type  of  topology  (PP,  PS,  G), 

d)  the  number  of  tasks  in  the  algorithm  (m). 
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a)  A  SIFM/G/4  algorithm 


hA 


^4  P4 


b)  A  d/5IFM/G  model 


c)  A  cs/SIFM/G  model 
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d)  A  cm/SIFM/G  model 


Figure  10.  Models  of  the  distributed  and  the  centralized  processing  systems  for  a 
SIFM/G/4  algorithm. 


e)  the  type  of  distribution  of  inter-arrival  times  of  jobs  from  outside  (Poisson  [M],  Deter¬ 
ministic  [D],  General  [G]), 

f)  the  type  of  distributions  of  task  services  (M,  D,  G),  and 

g)  the  service  disciplines  within  processors  (FCFS,  LCFS-PR,  PS  etc.). 

In  order  to  concisely  refer  to  processing  systems  with  the  same  attributes,  we  list  the 
acronyms  corresponding  to  the  attributes  in  the  order  listed  above;  (e.g., 
d/SIFM/G/m/M/M/FCFS.)  The  last  four  of  these  attributes  may  be  left  out  at  times, 
if  their  specification  is  clear  from  the  context.  The  average  execution  time  in  a  system  is 
denoted  by  a  subscripted  T,  where  the  subscript  consists  of  the  notation  that  specifies  the 
system.  We  also  use  Tj,  Tc,,  and  Tcmy  a  more  concise  notation,  when  the  values  of  the 
other  system  attributes  are  clear  from  the  context.  The  ratio  of  the  distributed  execution 
time  to  the  sequential  execution  time  is  denoted  as  R(4^ct)  ^(d,em)  depending  on  the 

model  used  for  the  centralized  system.  A  summary  of  the  notation  used  in  the  report  is 
given  in  Appendix  1. 


IV.  Distributed  versus  Centralised  Processing  Systems 

When  the  resources  are  limited  with  the  linear  constraint,  namely  G",  =  C,  Klein- 
rock  [5]  states  that  distributed  processing  is  slower  than  centralized  processing.  This  was 
shown  to  be  true  for  algorithms  which  consist  of  a  single  series  of  tasks.  The  centralized 
system  was  modelled  as  a  single-stage  service  model,  and  the  distributions  for  service  times 
and  inter-arrival  times  were  assumed  to  be  exponential. 

In  this  section,  we  show  that  while  the  above  result  is  true  even  for  the  more  general 
structures  of  SIFM  and  PERT,  there  are  cases  of  algorithms  and  models  for  processing 
system,  namely  the  multi-stage  model  for  centralized  processing  systems,  where  the  oppo¬ 
site  result  is  true;  that  is  distributed  processing  outperforms  centralized  processing.  We 
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proceed  to  demonstrate  this  fact  in  the  following  manner  In  Section  4.1,  we  prove  the 
optimality  of  centralized  processing  for  a  class  of  algorithms,  namely  SIFM  and  PERT 
algorithms,  under  the  single-stage  model.  The  assumptions  made  in  this  section  are  the 
same  as  the  ones  used  in  [5],  and  the  result  obtained  here  is  merely  an  extension  of  the 
result  obtained  in  (5]  to  more  general  structures.  In  section  4.2,  while  retaining  the  as¬ 
sumptions  made  in  Section  4.1,  we  consider  the  multi-stage  model  of  centralized  systems. 
In  this  case,  the  result  is  shown  to  be  true  only  for  a  restricted  range  of  parameters.  In  this 
section,  we  also  give  an  example  which  shows  that  there  are  values  for  which  a  distributed 
system  is  better  than  a  centralized  one.  In  Section  4.3,  we  extend  the  results  derived  in 
Section  4.2  to  two  non-FCFS  service  disciplines,  and  to  general  distributions  of  service 
times. 

4.1  Centralized  Systems  with  Single-stage  Service 

The  systems  considered  in  this  section  have  the  characteristics  that  (1)  the  external 
arrival  process  is  Poisson,  (2)  each  processor  uses  FCFS  service  discipline,  (3)  the  service 
distributions  are  exponential,  (4)  the  single-stage  service  model  is  considered  for  the  cen¬ 
tralized  system,  and  (5)  the  resources  are  limited  such  that  C",  =  C.  Under  these 
conditions,  the  average  execution  time  in  a  distributed  system  for  a  SIFM  or  a  PERT  al¬ 
gorithm  is  higher  than  that  of  the  centralized  system  executing  the  same  algorithm.  This 
proposition  is  true  regardless  of  the  structure  of  the  algorithm.  This  can  be  stated  more 
concisely  as 
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Proposition  1 


(i)  T^fSIFMIG/mlMIM/FGFS  >  '^c»ISIFMlGlmjM/MlFGFS 

(ii)  Tj/pERT/G/m/M/M/FGFS  >  '^c$/P  GRTlG/m/M/M/FGFS 

Proof: 

P«rt  (i) 

To  prove  part  (i),  let  us  consider  an  arbitrary  SIFM/G/m  algorithm.  Our  first  task  is 
to  find  the  average  execution  time  for  the  distributed  system,  Tj.  Given  th..  assumptions, 
the  model  of  the  distributed  system  is  analogous  to  the  network  of  queues  model  analyzed 
by  Jackson  [6].  He  showed  that  the  distribution  of  the  number  of  jobs  in  the  system  equals 
the  product  of  the  marginal  distribution  of  the  number  of  jobs  at  each  queue,  and  the 
marginal  distribution  is  the  same  as  that  of  the  distribution  of  jobs  in  an  independent 
M/M/1  queue.  Such  networks  of  queues  are  referred  to  as  having  a  product-form  solution. 

We  define  the  notions  of  route  and  class  of  an  external  request  as  follows.  In  the 
distributed  system,  a  request  generates  a  sequence  of  jobs  to  be  executed.  We  define  the 
route  of  a  request  as  the  sequence  of  processors  that  execute  the  sequence  of  jobs  generated 
by  a  request.  A  request  is  classified  according  to  its  route.  Let  r, r  =  1,2,...,J?,  denote 
the  class  of  a  request,  and  let  o,  denote  the  route  of  a  request  of  class  r.  We  shall  use 
t  €  Or  to  indicate  that  processor  P,  is  in  route  Or. 

Let  requests  arrive  at  a  rate  of  X  jobs  per  second.  Given  f^e  defined  arrival  process 
A,  and  the  random  selection  of  downstream  tasks  in  If-then-else  precedence  relationship 
at  the  “phantom”  node,  an  arrival  is  of  class  r  with  some  fixed  probability;  hence,  the 
arrivals  of  requests  of  class  r  also  form  a  Poisson  process.  Let  the  rate  of  such  arrivals  be 
Xf.  Then, 

a  =  i;a.  (1) 

r=l 
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Let  Yi  be  the  average  execution  time  of  job  at  Processor  P,.  As  shown  by  Kleinrock 
[p.  321,  4],  the  average  execution  time  is  expressed  in  terms  of  P,  as 


74 


/V* 

t=i  ^ 


(2) 


where 

R 

ot,  =  ^  A;.  (3) 

r=l 

•ea(r) 

is  the  total  rate  of  arrivals  to  processor  P,-.  Because  the  underlying  network  of  queues  model 
has  a  product-form  solution,  P,  is  equal  to  the  average  system  delay  in  an  independent 
M/M/1  system  that  has  arrival  rate  a,  and  service  rate  Thus,  we  have 


y;  = 


1 

M.  C.  -  Ot. 


(4) 


Elqs.  (2),  (3),  and  (4)  altogether  give  us  the  closed-form  solution  of  the  average  execution 
time  in  the  distributed  system  of  any  SIFM  algorithm. 


Our  next  task  is  to  find  the  average  execution  time  Tc,.  When  a  centralized  system  is 
assumed  to  have  a  single-stage  service  facility,  this  quantity  is  the  average  system  delay  in 
an  M/M/1  system  with  arrival  rate  A  and  service  rate  fiC,  where  the  parameter  fx  is  given 
by 


^  A  /A  1 

r=It€<i(r)  P'  ''.  =  1 


The  average  execution  time,  Tct,  is  thus  given  by 


(5) 


^C-\  A(C-E.^=i(a.//i.)l 


Given  specific  values  for  the  various  parameters,  these  equations  can  be  used  to  cal¬ 
culate  and  compare  execution  times  of  a  given  algorithm.  The  proof  of  part  (i)  of  the 
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proposition,  which  is  stated  for  non-specific  values  of  parameters,  proceeds  by  first  mini¬ 
mizing  over  C,  the  ratio  R(d,c$)  the  execution  times.  The  constraints  for  the  minimization 
problem  are  =  C  and  C,-  >  0,t  =  1,2,  ..m.  Since  C  is  assumed  to  be  constant, 

the  problem  of  minimizing  the  ratio  with  these  constraints  is  equivalent  to  the  problem  of 
minimizing  Td  with  the  same  constraints.  The  latter  minimization  problem  is  the  one  that 
we  consider  next.  If  all  queues  are  stable,  i.e.,  on/tiiCi  <  1,  then  the  objective  function 
is  convex  over  C,,  as  can  be  shown  by  twice  differentiating  F,.  Hence,  the  minimization 
problem  has  a  unique  solution. 

The  minimization  problem,  solved  using  the  method  of  Lagrange  multiplier,  is  similar 
to  the  capacity  assignment  (CA)  problem  considered  for  computer  communication  networks 
[p.  330,  4].  The  Lagrangian  function  is 


g  =  Tj  +  0(Eq-c) 

«=l 

where  is  the  Lagrangian  multiplier.  Forming  the  partial  derivatives  for  all  t,  and 
equating  them  to  zero,  we  shall  get  optimal  values  of  C7,-  in  terms  of  the  unknown  parameter 
p.  By  using  the  constraint  equation  =  C  Xo  eliminate  the  unknown  parameter 

we  obtain  the  optimal  values  of  C,  as 

'  T.Zt  \/^M 


1  where  Ce  =  C  —  t^i)-  The  assumption  that  all  queues  are  stable  guarantees  that 

C*  >0,  and  that  the  minimization  is  feasible.  From  Eq.(6),  one  finds  that  the  condition 
Ce  >  0  also  guarantees  that  the  single  server  of  the  centralized  system  is  stable. 


24 


By  substituting  the  optimal  value  of  capacities,  C*,  in  Eqs.  (2),  (3),  and  (4),  we  have 


min  T  —  . 


[ESi 

AC-AESita^M) 


From  Elqs.  (6)  and  (7),  the  minimum  value  of  the  ratio  is  given  as 


(Er=i 

Er=i(«./M.) 


(7) 


Consider  the  r.h.s.  of  the  above  equation.  As  the  expansion  of  the  numerator  contains  the 
denominator  terms  plus  some  other  positive  terms,  we  conclude  that  the  ratio  is  greater 
than  one  for  all  possible  values  of  parameters  C,-,  /i,-,  and  A^.  This  completes  the  proof  of 
part  (i)  of  the  Proposition. 


Fart  (ii) 

To  prove  part  (ii)  of  the  Proposition,  let  us  consider  an  arbitrary  PERT/G/m  algo¬ 
rithm.  In  the  model  of  its  distributed  system,  a  single  request  may  generate  more  than 
one  job  at  a  time.  This  leads  to  a  network  of  queues  model  which  is  not  shown  to  possess 
a  product-form  solution;  this  makes  it  impossible  to  find  a  closed-form  solution  for  T^. 
Nonetheless,  it  is  possible  to  prove  part  (ii)  by  making  use  of  the  following  Lemma. 

Lemma  1:  From  a  given  PERT/G/m  algorithm,  we  define  a  new  algorithm  which 

has  the  same  tasks,  but  a  purely  parallel  topology.  By  abuse  of  notation,  we  shall  use 
PERT/PP(G)/m  to  denote  this  new  algorithm.  We  claim  that 

’^d/PERT/G  >  TiiPERTIPP(G) 


Proof:  Let  T^fPERTJG  be  the  execution  time  of  a  request  to  a  a/PERT/G  system. 

T^/PERT/g  ^  random  variable  which  can  be  expressed  as  a  series  of  additions  and  max¬ 
imizations  of  individual  delay  time  at  processor  nodes,  say  {F,}.  In  general  {F,}  are 
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dependent  random  variables.  For  instance,  for  the  example  of  Figure  4  (b),  we  have 

^dfPERTfG  =  niax(Ki  +  Fa  +  max(K4,  F5)  +  Fi  +  Yi,  Y1  +  Y2  +  Yj) 

For  a  general  PERT  structure,  fdjpERT/G  is  ^  function  ^(yi,y2f-,^m)  which  depends 
on  the  topology  of  the  structure.  Let  TdiPERTlPP(G)  be  the  execution  time  of  a  request 
to  the  d/PERT/PP(G).  It  is  given  by 

^dlPERTtPP(G)  =  max(Fi,  F2)  •  •  • ,  Y^)  (8) 

Then,  independent  of  the  function  'J,  one  can  show  that  TjjPEppjQ  >  TdiPERTlPP(G)t 
and  thus 

'I'd/PERT/G  =  ^[^dlPERTIG]  E\fdipERT /PP{G)\  =  ^'d/PERT JPP[G) 
which  proves  the  lemma. 

Next,  we  shall  establish  that  7jipertjpP{G)  —  'Yci/pertiG'  First  of  all,  we  note  that 
from  Eq.  (8)  we  have 

Td/PERT/PP(G)  =  E[fd/PERT/PP(G)]  >  uiax(£'[Fi], ^'[Fi], . . . , .^[Fm])  (9) 

Since  in  a  d/PERT/PP  system,  random  variables  F,  are  independent,  and  each  processor 
has  a  rate  of  A  arrivals  per  second,  the  average  execution  time  for  the  »***  processor  is 

The  r.h.s,  of  Eq.  (9)  is  minimized  over  C,-  with  the  constraint  E”  1  Q  =  C  when 

Fi  =  F2  =  ...  =  F,„ 
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which  means  that 


TuPERT/PPiO)  >  =  • 


t^m^m  ~  ^ 


(11) 


From  E)q.  (11),  one  finds  that  the  optimal  capacities  are 


Ci  = —Cl  for  t  =  2,  ,..,m 

M. 


By  substituting  these  values  in  the  constraint  equation,  we  have 


m 

=  C 


(12) 


«  =  1 


The  average  execution  time  in  the  corresponding  centralized  system  is  found  easily 
since  the  model  is  an  M/M/1  system  with  arrival  rate  A  and  service  rate  /iC,  where  the 
parameter  n  is  given  by 


The  average  execution  time  Tc/pert/g  given  by 

^  1 

TcfPERTlG  C-XEr=lf^7' 

From  Eqs.  (11),  (12)  and  (13),  we  conclude  that 

TdlPERTIPP(G)  >  T'c$IPERTIG 

This  inequality  along  with  that  of  Lemma  1  proves  part  (ii)  of  the  proposition. 


(13) 
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4.2  Centraliied  Systems  with  Multi-stage  Service 

When  the  service  of  a  centralized  system  is  modelled  as  a  multi-stage  service,  we  show 
that  the  result  of  the  comparison  between  distributed  and  centralized  systems  is  not  always 
in  favor  of  centralized  systems.  We  shall  first  establish  a  result  that  identifies  a  range  of 
values  of  parameters  for  which  centralized  systems  are  always  better,  and  then  consider 
an  example  where  distributed  systems  are  better. 

Consider  in  this  section  systems  which  are  such  that  (1)  the  arrival  process  is  Poisson, 
(2)  all  processors  use  FCFS  service  discipline,  (3)  service  distributions  of  all  tasks  are 
exponential,  (4)  the  resources  are  limited  such  that  C,  =  C,  and  (5)  the  service 
model  in  the  centralized  system  is  multi-stage.  Let  be  the  coefficient  of  variation  of 
the  service  time  in  the  centralized  system.  Under  these  conditions,  the  average  execution 
time  in  a  distributed  system  for  a  SIFM  algorithm  is  higher  than  that  of  a  centralized 
system  executing  the  same  algorithm,  provided  the  structure  of  the  algorithm,  the  arrival 
distribution  and  the  service  distributions  are  such  that  <  1.  As  we  show  later  in  this 
section,  if  >  1  then  the  above  statement  is  not  true.  On  the  other  hand,  for  a  PERT 
algorithm,  centralized  systems  are  always  better. 

Proposition  2 

(i)  if  <  1  then  Tj/siFM/G/m/M/M/FCFS  ^  '^cmlSIFM/G/mlM/MlFGFS 

(ii)  '^d/PERT/G/m/M/M/FGFS  >  '^cm/PERTfG/m/M/M/FGFS 

Proof: 

Part  (i) 

To  prove  part  (i)  of  this  proposition,  let  us  consider  an  arbitrary  SIFM/G/m  algorithm. 
The  average  execution  time  in  a  distributed  system  Tj  is  as  in  Section  4.1  (See  Eqs.  2-  4). 

We  model  the  centralized  system  as  an  M/G/1  system  with  the  average  service  time 
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(iZ/xC),  and  the  coefficient  of  variation  (C^)  given  as 


1 

(/*C)  ^ 


and 

Applying  the  P-K  mean  value  formula  for  the  average  response  time  of  an  M/G/1  system 
[p.  191,  8]  we  get  the  average  execution  time  of  a  centralized  system  as 


_  1  ,  p(i+gg) 

ItC  2/iC(l  -  p) 


where  p  =  A//iC  is  the  utilization  factor.  By  rearranging  the  terms  of  the  r.h.s,  of  the 
above  equation,  we  have 


Since  the  p.  and  A  are  the  same  for  the  single-stage  model  as  for  the  multi-stage  model,  we 
use  Elq,  (6)  to  derive  the  following  relation  between  average  execution  times  of  these  two 
models. 

r™  =  r.,(i  +  ^(c?-i))  (15) 

The  utilization  factor  p  is  always  non-negative.  Hence,  if  <  1  then  Tc$  >  Tcm>  This 
inequality  and  part  (i)  of  Proposition  1  proves  part  (i)  of  this  proposition. 

Part  (ii) 

To  prove  part  (ii),  consider  an  arbitrary  PERT/G/m  algorithm.  The  average  execution 
time  in  a  distributed  system  Tj  has  not  changed  from  that  of  Section  4.1  (See  Eqs.  8-  11). 
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For  any  PERT  algorithm,  the  multi-stage  service  model  is  hyper-exponential,  i.e.,  a 
single  series  of  exponential  stages.  The  average  service  time  (l//iC')  is 

~  HiC 

Since  the  service  time  distribution  is  hyper-exponential,  the  coefl5cient  of  variation  (C|)  is 
never  greater  than  one. 

The  fj,  and  A  are  the  same  for  the  single-stage  and  for  the  multi-stage  model.  Hence,  the 
relationship  between  Tcm  and  as  given  by  Ekj.  (15)  is  valid.  And,  since  p  is  always  non¬ 
negative,  and  cl  is  never  greater  than  one,  we  have  Tcm  <  Ic,  for  any  PERT  algorithm. 
This  inequality  and  part  (ii)  of  Proposition  1  proves  part  (ii)  of  this  proposition. 

QED 

We  now  investigate  the  relative  performance  of  the  systems  when  >  1  by  consider¬ 
ing  an  example  of  a  SIFM/PP/2  algorithm.  The  observations  made  for  this  case  can  be 
extrapolated  to  a  SIFM/PP/m  algorithm.  Note  that  when  /i,  =  /x,  i  =  1,2,  the  coeflB- 
cient  of  variation  is  one.  Hence,  from  Eq.  (15)  it  is  obvious  that  centralized  systems 
are  always  better  if  the  /i,  ’s  are  equal.  This  is  the  result  that  was  shown  by  Kleinrock  [4]. 

Our  objective  is  to  explore  the  case  where  fii  #  /i2,  and  hence  >  1.  Using  Eqs.  (2), 
(3),  and  (4)  to  get  T^,  and  Eqs.  (6),  (14),  and  (15)  to  get  Tcm,  we  get  the  ratio  Rd^cm’ 
Minimizing  the  ratio  over  C,  with  the  constraint  Ci  +  C2  =  C  gives 


|E?=.(A./A//.)|(1  +  {p/2){Cl  -  1)1 


(16) 


where 

*  E?=,(A,/Aft)]2 


(17) 
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and 


p  4  xipC  =  (1/C) 

t  =  l 

Now,  we  shall  show  that  for  some  values  of  the  parameters,  ,cm)  <  1.  Note  that  as 
long  as  Hi  ^  H2i  Cl  >  1.  Let  us  select  Ai/A  =  €,hi  =  and  H2  =  1-  For  e  «  1,  this 
assignment  of  values  corresponds  to  a  situation  where  a  large  number  of  requests  require 
very  little  work  to  be  done,  i.e,,  arrive  to  processor  Pi.  Substituting  the  above  values 
in  Eqs.  (16)  and  (17),  we  can  easily  show  that  as  e  — »  0,  and  as  p  — ♦  l,R^jcm) 

Figure  11,  we  plot  R^^cm)  versus  e,  i.e,,  Ai/A,forp  =  0.1, 0,5,  and  0.9,  These  curves  clearly 
show  that  in  this  example  there  are  many  values  of  A] /A  for  which  the  distributed  system 
is  better  than  the  centralized  system.  Hence  we  conclude  that  for  the  given  assumptions, 
part  (i)  of  Proposition  2  cannot  be  extended  to  all  SIFM  algorithms. 


4.3  Non-£xponential  Service  Times 

In  the  previous  sections,  the  service  received  by  a  job  at  a  processor  in  the  distributed 
system  was  assumed  to  be  exponentially  distributed.  The  assumption  may  be  appropriate 
for  tasks  which  involve  data-dependent  iterative  operations.  But,  there  are  tasks  which 
involve  a  constant  number  of  operations.  Hence,  we  consider  in  this  section  the  cases  where 
Bi{x)  have  non-exponential  distributions. 

The  first  case  that  we  consider  is  that  of  a  system  which  uses  either  LCFS-PR  or  PS 
service  discipline.  We  consider  these  disciplines  first  because  the  model  of  the  distributed 
system  of  a  SIFM/G/m  algorithm  has  a  product- form  solution,  and  hence  is  easy  to 
analyze. 

The  systems  that  we  consider  in  this  section  are  such  that  (1)  the  arrival  process 
is  Poisson  (2)  processors  uses  either  LCFS-PR  or  PS  service  discipline  (3)  service  time 
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(«"-'*  P)y  '£)uiui 


2.0 


Figure  11.  A  performance  comparison  of  the  distributed  and  the  centralized  systems  of 
a  SIFM/PP/2  algorithm  with  m  =  1,  =  (Ai/A)^,  and  Ai  +  A2  =  A. 
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distributions  are  general,  (4)  resources  are  limited  such  that  53,^1  =  C,  and  (5)  multi¬ 

stage  service  model  is  used.  With  these  assumptions,  the  average  execution  time  for  a 
distributed  system  of  a  SIFM  or  a  PERT  algorithm  is  higher  than  that  of  a  centralized 
system  executing  the  same  algorithm,  regardless  of  the  actual  distributions  or  the  algorithm 
structure. 

Proposition  S  Using  X  service  discipline  to  mean  that  processors  uses  either  LCFS-PR 
or  PS  service  discipline,  we  can  state  the  above  proposition  more  concisely  as 

(i)  '^d/SIFM/G/m/M/G/X  ^  ^cmlSIFM/GfmlMfGIX 
(ii)  '^d/PERTIG/m/M/G/X  >  '^cm/PERT /GIm/M/G/X 

The  proof  of  this  proposition  hinges  on  two  facts.  The  first  fact  to  note  is  that  the 
network  of  queues  model  of  a  d/SIFM/G/m/M/G/X  system  has  a  product-form  solution 
[7].  Hence,  the  average  execution  time  at  a  processor,  i.e.  T,’,  is  given  as  if  the  ith  queue 
is  running  independently  with  Poisson  arrivals  at  a  rate  or,-.  The  second  fact  to  note  is 
that  the  average  system  delay  of  an  M/G/1  system  with  either  LCFS-PR  or  PS  service 
discipline  depends  only  on  the  first  moment  of  its  service  time  distribution  [8]. 

Proof: 

Part  (i) 

To  prove  part  (i),  let  us  consider  a  SIFM/G/m  algorithm.  Since  the  model  of  the  dis¬ 
tributed  system  has  a  product-form  solution,  the  t'th  processor  behaves  as  an  independent 
M/G/1  system  with  either  a  LCFS-PR  or  a  PS  service  discipline,  the  arrival  rate  of  o,-, 
and  a  general  service  distribution  with  the  first  moment  denoted  by  (iZ/i.C,).  Let  Yi  be 
the  average  execution  time  of  a  job  at  the  »th  processor.  For  such  a  model,  it  has  been 
shown  in  [8]  that 
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As  this  Eq.  is  identical  to  Eq.  (4),  the  rest  of  the  proof  is  similar  to  that  of  part  (i)  of 
Proposition  1. 

Part  (ii) 

To  prove  part  (ii),  let  us  consider  a  PERT/G/m  algorithm.  First  of  all,  we  note  that 
Lemma  1  is  valid  for  any  service  time  distribution.  Hence,  we  have 

TdIPERT/G  >  TjipertIPP(G) 

In  the  d/PERT/PP(G)  system,  each  processor  is  modelled  as  an  M/G/1  system  with  either 
a  LCFS-PR  or  a  PS  service  discipline,  arrival  rate  A,  and  a  general  service  time  distribution 
with  the  first  moment  denoted  by  {l/fiiCi).  Let  F,-  be  the  average  execution  time  of  a  job 
at  the  ith  processor.  As  noted  above  for  this  system,  F,-  is  given  by 

F  =  — i— 

‘  -  A 

As  this  Eq.  is  identical  to  Eq.  (10),  the  rest  of  the  proof  is  similar  to  that  of  part  (ii)  of 
Proposition  1. 

QED 

It  is  difficult  to  obtain  a  similar  result  for  a  SIFM/G/m  algorithm  with  general  service 
distributions  and  FCFS  service  disciplines  at  processors,  since  the  model  of  its  distributed 
system  does  not  have  a  product-form  solution.  But,  it  is  possible  to  find  closed-form 
solutions  for  some  particular  systems,  e.g.  d/SIFM/PP  systems.  In  the  following,  we  shall 
consider  a  specific  case,  namely,  a  SIFM/PP/2/M/D/FCFS  system. 

Both  processors  of  the  distributed  system  are  modelled  as  M/D/1  systems  with  FCFS 
service  discipline.  The  arrival  rate  is  A,,i  =  1,2,  and  the  service  rate  is  =  1,2  at 

processor  P,  ,*  =1,2.  Hence,  the  average  execution  time  (1^)  at  the  I'tj,  processor  is 

y.  —  ~ 

HiCiiniCi  -  A.) 
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and  the  average  execution  time  {T4)  of  a  request  is 


Ti  =  Etym  =  E 

i=l  .=1 


The  processor  of  the  centralized  system  is  also  modelled  as  an  M/D/1  system  with 
FCFS  service  discipline.  The  arrival  rate  is  A  and  the  service  rate  is  /xC,  where  the 
parameter  n  is  given  by 

t=i 

The  average  execution  time  {Tcm)  is 


2fiC  —  A 
=  piC(fiC  -  A) 


Thus,  the  ratio  is 


„  £?=l(A.Mg.)(2f.g,  -  A,)/(p.C.  -  A.) 

(A/,iC)(2Mg  -  A)/(tiC  -  A) 

The  number  of  independent  parameters  in  the  above  equation  can  be  reduced  with  the 
definition  of  a  new  parameter,  K  =  (Ai//ij)/(A2/;i2)>  which  is  called  the  design  ratio.  The 
ratio  R(ii^cm)i  expressed  in  terms  of  the  parameters  Ff,  p,  and  C,,  *  =  1,2,  is 


ELilsitZCt  -  pai)l^{C^  paj)] 

(2  -  p)/(l  -  p) 


(18) 


where  oi  =  K/l  +  K,  and  02  =  1/1  +  K. 

We  shall  show  that  for  this  case  the  centralized  system  is  always  better  than  the 
distributed  system.  In  order  to  show  this  we  shall  first  consider  the  minimization  of  the 
ratio  over  C,  with  the  constraint,  Ci  +  C2  =  C.  Let  F{K,pyC\^C2)  denote  the  numerator 
of  the  r.h.s.  of  Eq.  (18).  The  above  minimization  problem  is  equivalent  to  the  problem  of 
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minimizing  F[.)  over  C,-  for  the  same  constraint.  The  function  F(.)  is  convex  over  C,-,  as  can 
be  shown  by  twice  differentiating  it  with  respect  to  C,.  Using  the  Lagrangian  method  gives 
a  set  of  fourth-degree  polynomial  equations  that,  in  theory,  can  be  simultaneously  solved. 
In  practice,  the  solution  is  too  unwieldy.  Hence,  we  chose  to  use  the  convex  programming 
method.  We  calculated  the  minimum  of  F{.)  (and  for  ^  given  p  and  K 

using  a  gradient-search  type  of  algorithm.  In  Figure  12,  we  show  the  minimum  values  of 
^(d,cm)  obtained  for  various  values  of  the  design  ratio  K  and  p.  There  is  very  little 
variation  observed  as  p  varies  in  the  range  0  to  1.  For  any  value  of  /?,  as  K’  varies  in  the 
range  (0,  oo)  the  minimum  value  of  the  ratio  is  always  greater  than  one.  Also,  using  Eq. 
(18),  it  can  be  shown  that  as  K  —*  0  or  K  +  oo,  R^d,cm)  1-  Thus,  for  this  specific 
example  the  distributed  system  is  better  than  the  centralized  system.  Given  this  result 
about  a  SIFM/PP/2/M/D/FCFS  system,  it  is  easy  to  show  that  the  same  result  is  true 
for  a  SIFM/PP/m/M/D/FCFS  system. 


K  -  him 

A3//1J 

min  Rd,cm 
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Figure  12.  A  SIFM/PP/2/M/D/FCFS  system. 
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V. 


Conclusions 


In  this  report,  we  have  considered  the  relative  performance  of  distributed  processing 
and  centralized  processing  of  an  algorithm.  The  algorithms  were  abstracted  as  consisting 
of  a  number  of  tasks  with  precedence  constraints  among  them.  The  types  of  precedence 
relationships  were  such  as  to  allow  pipelined  processing  of  various  requests,  and  concurrent 
processing  of  tasks  for  a  request  in  a  distributed  processing  system.  Our  main  assumption 
about  the  processing  systems  used  was  that  the  processing  capacity  of  a  centralized  system 
is  the  sum  of  capacities  of  processors  in  a  distributed  system. 

In  this  report,  we  have  shown  that,  given  the  constraint  on  the  capacities  of  processors, 
the  centralized  system  does  not  always  outperform  the  distributed  system.  The  result 
differs  from  the  one  obtained  in  [5].  The  difference  is  not  due  to  a  general  model  of 
relationship  among  tasks  proposed  here,  but  is  due  to  our  use  of  an  exact  model  for 
a  centralized  system,  namely  the  multi-stage  model.  We  have  also  shown  the  range  of 
parameter  values  for  which  distributed  systems  are  always  worse,  and  have  given  examples 
for  which  a  distributed  system  is  better.  Even  though  we  have  found  specific  cases  of 
algorithms  for  which  the  distributed  processing  outperforms  centralized  processing,  in 
most  cases  the  opposite  result  holds  true.  Hence,  a  good  heuristic  rule  of  design  is  to  use 
a  centralized  processing  system  as  long  as  the  technological  considerations  do  not  prevent 
the  manufacturing  of  the  processor  of  the  required  capacity,  and  as  long  as  the  linear  cost 
constraint  is  applicable. 

The  approach  used  here  in  modelling  distributed  processing  is  a  new  one.  The  question 
that  arises  in  this  context  is  whether  one  could  use  this  approach  for  achieving  a  general 
theory  of  distributed  processing  that  will  help  in  designing  distributed  systems.  The  an¬ 
swer  is  not  very  encouraging  primarily  because  of  the  following  short-comings.  The  major 
problem  is  the  inability  to  analyze  exactly  the  algorithms  that  use  Fork  and  Join  relation¬ 
ships,  and  the  algorithms  that  have  general  service  distributions  (and  FCFS  disciplines 
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for  each  processor  of  the  systems).  Another  problem  with  the  model  is  that  it  ignores  the 
impact  of  communications  among  tasks.  We  hope  to  carrj’  out  further  research  to  deal 
with  some  of  these  issues. 
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Appendix  I 


The  following  is  a  useful  summary  of  the  notation  used  in  the  report. 

Pi  a  processor  in  a  distributed  system 
Fi  task  assigned  to  processor  P, 

Ei  a  request  to  execute  a  algorithm 

Ji  j  A  job  to  execute  task  Fj  for  a  request  P, 

fii  mean  number  of  operations  performed  for  a  job  J,  ,■ 

C,  total  capacity  (in  operation/sec)  of  processor  P, 
m  the  total  number  of  tasks  in  a  algorithm 
r  a  class  of  a  request 
R  the  total  number  of  classes 

ar  route  of  a  request  of  class  r 
A  external  arrival  rate  in  requests/second 

Af  external  arrival  rate  of  requests  of  class  r 
a,  total  arrival  rate  to  processor  Pi 

Zr  mean  execution  time  for  a  request  of  class  r 

Yi  mean  execution  time  of  a  job  J*,,-  at  Processor  P,- 

C  total  capacity  of  the  processor  in  a  centralized  system 

/i  mean  number  of  operations  performed  for  a  request  in  a  centralized  system 
coefficient  of  variation  of  service  time  in  a  centralized  system 
Td  mean  execution  time  for  a  request  in  a  distributed  system 

Tc,  mean  execution  time  for  a  request  in  a  centralized  system  with  single-stage  service 

Tcm  mean  execution  time  for  a  request  in  a  centralized  system  with  multi-stage  service 

~  TdfTct 

^{d,em)  ~  1'd/Tcm 
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